ugh an essential gene should not attract any transposon insertion
ally, a transposon may still be found in an essential gene because
l reasons [Lamichhane, et al., 2003; Deng, et al., 2013; Fels,
13]. For instance, the essentiality of a gene may not be disrupted
poson is inserted within the distal regions of a gene because the
gene product may still retain functionality. Often, the
ty of a gene may still maintain if a transposon is inserted in only
s of a gene. In addition to these biological reasons, there are also
nical reasons by which it is required to carefully examine the
on insertion pattern genome-wise to avoid missed discovery of
genes. First, the transposon sequencing technology is not yet
curate to distribute a transposon to the locations where it should
ut any bias. Second, aligning short transposon sequencing reads
ence genome will not be free of error. The shorter the sequencing
an alignment, the greater the error will be. Therefore, a few
s occurring in a gene may be caused by either error or both errors
tioned.
o the aforementioned reasons, there have been three updated
learning approaches for a robust gene essentiality pattern analysis
the high-throughput transposon sequencing technology. First, a
h a few insertion sites may be classified as an essential gene
ge, et al., 2009]. Second, a gene with a few insertions may also
fied as an essential gene [Zomer, et al., 2012]. Third, a gene with
poson insertions only in its distal regions may also be classified
ential gene [Yang, et al., 2017].
ensity estimation approaches have been exercised for discovering
genes for a long time. In the context of gene essentiality analysis,
principle of density estimation is to re-construct a density
which is unknown in advance, for a transposon statistic such as
poson insertions per gene statistic or the transposon insertion sites
statistic. Such a constructed density model is thus used for the
ation and prediction. The assumption is that such a density
should show a bimodal distribution, in which a cutting point can
ively determined to separate genes into two clusters. After a